Deepfakes are computationally-created entities that falsely represent reality. They can take image, video, and audio modalities, and pose a threat to many areas of systems and societies, comprising a topic of interest to various aspects of cybersecurity and cybersafety. In 2020 a workshop consulting AI experts from academia, policing, government, the private sector, and state security agencies ranked deepfakes as the most serious AI threat. These experts noted that since fake material can propagate through many uncontrolled routes, changes in citizen behaviour may be the only effective defence. This study aims to assess human ability to identify image deepfakes of human faces (StyleGAN2:FFHQ) from nondeepfake images (FFHQ), and to assess the effectiveness of simple interventions intended to improve detection accuracy. Using an online survey, 280 participants were randomly allocated to one of four groups: a control group, and 3 assistance interventions. Each participant was shown a sequence of 20 images randomly selected from a pool of 50 deepfake and 50 real images of human faces. Participants were asked if each image was AI-generated or not, to report their confidence, and to describe the reasoning behind each response. Overall detection accuracy was only just above chance and none of the interventions significantly improved this. Participants' confidence in their answers was high and unrelated to accuracy. Assessing the results on a per-image basis reveals participants consistently found certain images harder to label correctly, but reported similarly high confidence regardless of the image. Thus, although participant accuracy was 62% overall, this accuracy across images ranged quite evenly between 85% and 30%, with an accuracy of below 50% for one in every five images. We interpret the findings as suggesting that there is a need for an urgent call to action to address this threat.
translated by 谷歌翻译
Artificial Intelligence (AI) is having a tremendous impact across most areas of science. Applications of AI in healthcare have the potential to improve our ability to detect, diagnose, prognose, and intervene on human disease. For AI models to be used clinically, they need to be made safe, reproducible and robust, and the underlying software framework must be aware of the particularities (e.g. geometry, physiology, physics) of medical data being processed. This work introduces MONAI, a freely available, community-supported, and consortium-led PyTorch-based framework for deep learning in healthcare. MONAI extends PyTorch to support medical data, with a particular focus on imaging, and provide purpose-specific AI model architectures, transformations and utilities that streamline the development and deployment of medical AI models. MONAI follows best practices for software-development, providing an easy-to-use, robust, well-documented, and well-tested software framework. MONAI preserves the simple, additive, and compositional approach of its underlying PyTorch libraries. MONAI is being used by and receiving contributions from research, clinical and industrial teams from around the world, who are pursuing applications spanning nearly every aspect of healthcare.
translated by 谷歌翻译
在不失去先前学习的情况下学习新任务和技能(即灾难性遗忘)是人为和生物神经网络的计算挑战,但是人工系统努力与其生物学类似物达成平等。哺乳动物的大脑采用众多神经手术来支持睡眠期间的持续学习。这些是人工适应的成熟。在这里,我们研究了建模哺乳动物睡眠的三个不同组成部分如何影响人工神经网络中的持续学习:(1)在非比型眼运动(NREM)睡眠期间观察到的垂直记忆重播过程; (2)链接到REM睡眠的生成记忆重播过程; (3)已提出的突触降压过程,以调整信噪比和支持神经保养。在评估持续学习CIFAR-100图像分类基准上的性能时,我们发现将所有三个睡眠组件的包含在内。在以后的任务期间,训练和灾难性遗忘在训练过程中提高了最高准确性。尽管某些灾难性遗忘在网络培训过程中持续存在,但更高水平的突触缩减水平会导致更好地保留早期任务,并进一步促进随后培训期间早期任务准确性的恢复。一个关键的要点是,在考虑使用突触缩小范围的水平时,手头有一个权衡 - 更具侵略性的缩减更好地保护早期任务,但较少的缩减可以增强学习新任务的能力。中级水平可以在训练过程中与最高的总体精度达到平衡。总体而言,我们的结果都提供了有关如何适应睡眠组件以增强人工连续学习系统的洞察力,并突出了未来神经科学睡眠研究的领域,以进一步进一步进行此类系统。
translated by 谷歌翻译
尽管电子健康记录是生物医学研究的丰富数据来源,但这些系统并未在医疗环境中统一地实施,并且由于医疗保健碎片化和孤立的电子健康记录之间缺乏互操作性,可能缺少大量数据。考虑到缺少数据的案例的删除可能会在随后的分析中引起严重的偏见,因此,一些作者更喜欢采用多重插补策略来恢复缺失的信息。不幸的是,尽管几项文献作品已经通过使用现在可以自由研究的任何不同的多个归档算法记录了有希望的结果,但尚无共识,MI算法效果最好。除了选择MI策略之外,归纳算法及其应用程序设置的选择也至关重要且具有挑战性。在本文中,受鲁宾和范布伦的开创性作品的启发,我们提出了一个方法学框架,可以应用于评估和比较多种多个插补技术,旨在选择用于计算临床研究工作中最有效的推断。我们的框架已被应用于验证和扩展较大的队列,这是我们在先前的文献研究中提出的结果,我们在其中评估了关键患者的描述符和Covid-19的影响在2型糖尿病患者中的影响,其数据为2型糖尿病,其数据为2型糖尿病由国家共同队列合作飞地提供。
translated by 谷歌翻译
语言模型既展示了定量的改进,又展示了新的定性功能,随着规模的增加。尽管它们具有潜在的变革性影响,但这些新能力的特征却很差。为了为未来的研究提供信息,为破坏性的新模型能力做准备,并改善社会有害的效果,至关重要的是,我们必须了解目前和近乎未来的能力和语言模型的局限性。为了应对这一挑战,我们介绍了超越模仿游戏基准(Big Bench)。 Big Bench目前由204个任务组成,由132家机构的442位作者贡献。任务主题是多样的,从语言学,儿童发展,数学,常识性推理,生物学,物理学,社会偏见,软件开发等等。 Big-Bench专注于被认为超出当前语言模型的功能的任务。我们评估了OpenAI的GPT型号,Google内部密集变压器体系结构和大型基础上的开关稀疏变压器的行为,跨越了数百万到数十亿个参数。此外,一个人类专家评估者团队执行了所有任务,以提供强大的基准。研究结果包括:模型性能和校准都随规模改善,但绝对的术语(以及与评估者的性能相比);在模型类中的性能非常相似,尽管带有稀疏性。逐渐和预测的任务通常涉及大量知识或记忆成分,而在临界规模上表现出“突破性”行为的任务通常涉及多个步骤或组成部分或脆性指标;社交偏见通常会随着含糊不清的环境而随着规模而增加,但这可以通过提示来改善。
translated by 谷歌翻译
在本文中,我们研究了多语言句子嵌入的使用,以转移跨管辖区,法律制度(普通和民法),语言和域名的审判决策功能分割的预测模型(即语境)。利用原始环境之外的语言资源的机制在AI和法律中具有显着的潜在利益,因为法律制度,语言或传统之间的差异往往阻碍了更广泛的研究结果。我们使用跨语言可转换的门控复发单元(GRUS)分析使用语言无话句子表示的使用。调查不同背景之间的转移,我们开发了一种审判决策功能分割的注释方案。我们发现模特超出了他们接受培训的背景(例如,在美国的行政决定上培训的模型可以应用于意大利的刑法决定)。此外,我们发现在多种上下文上培训模型增加了鲁棒性并在评估先前看不见的上下文时提高整体性能。最后,我们发现,从所有上下文中汇集训练数据增强了模型的上下文性能。
translated by 谷歌翻译
为了在结构因果模型(SCM)中执行反事实推理,需要了解因果机制,它提供条件分布的因子,并将噪声映射到样本的确定性函数。遗憾的是,因象无法通过观察和与世界互动收集的数据唯一确定的因果机制,因此仍然存在如何选择因果机制的问题。最近的工作中,Oberst&Sontag(2019)提出了Gumbel-Max SCM,它由于直观上吸引的反事实稳定性而导致Gumbel-Max Reparameterizations作为因果机制。在这项工作中,我们认为选择在估算反事实治疗效果时最小化的定量标准的因果机制,例如最小化方差。我们提出了一个参数化的因果机制,概括了Gumbel-Max。我们表明他们可以接受培训,以最大限度地减少对感兴趣查询的分布的反事实效果方差和其他损失,从而产生比固定替代方案的反复治疗效果的较低方差估计,也推广到在培训时间未见的查询。
translated by 谷歌翻译
The popularity of online shopping is steadily increasing. At the same time, fake product reviewsare published widely and have the potential to affect consumer purchasing behavior. In response,previous work has developed automated methods for the detection of deceptive product reviews.However, studies vary considerably in terms of classification performance, and many use data thatcontain potential confounds, which makes it difficult to determine their validity. Two possibleconfounds are data-origin (i.e., the dataset is composed of more than one source) and productownership (i.e., reviews written by individuals who own or do not own the reviewed product). Inthe present study, we investigate the effect of both confounds for fake review detection. Using anexperimental design, we manipulate data-origin, product ownership, review polarity, and veracity.Supervised learning analysis suggests that review veracity (60.26 - 69.87%) is somewhat detectablebut reviews additionally confounded with product-ownership (66.19 - 74.17%), or with data-origin(84.44 - 86.94%) are easier to classify. Review veracity is most easily classified if confounded withproduct-ownership and data-origin combined (87.78 - 88.12%), suggesting overestimations of thetrue performance in other work. These findings are moderated by review polarity.
translated by 谷歌翻译
受到自然语言处理(NLP)中深度学习(DL)的成功的启发,我们应用了尖端DL技术,以预测战略时间地平线(4小时或更长时间)的飞行偏离需求。这项工作是为了支持斜切式的移动应用程序,PAIR,将预测的偏离需求显示为通用航空(GA)飞行运营商,因此他们可以更好地了解繁忙时期离开延误潜力的意识。涉及Pacer以前设计的基于规则的预测方法的现场示范表明,离职需求的预测准确性仍然具有改进的空间。本研究致力于提高来自两个关键方面的预测精度:更好的数据源和鲁棒预测算法。我们利用了两个数据来源,航空系统性能指标(ASPM)和系统广播信息管理(游泳)作为我们的输入。然后,我们用DL序列技术培训了预测模型,以序列(SEQ2Seq)和SEQ2Seq的注意力。案例研究表明,我们的SEQ2Seq在测试的四种预测算法中具有最佳。此外,与经典的自回归(AR)预测方法相比,通过更好的数据源,SEQ2Seq的注意力可以减少超过60%的平均平方误差(MSE)超过60%。
translated by 谷歌翻译
In this paper, we propose a novel technique, namely INVALIDATOR, to automatically assess the correctness of APR-generated patches via semantic and syntactic reasoning. INVALIDATOR reasons about program semantic via program invariants while it also captures program syntax via language semantic learned from large code corpus using the pre-trained language model. Given a buggy program and the developer-patched program, INVALIDATOR infers likely invariants on both programs. Then, INVALIDATOR determines that a APR-generated patch overfits if: (1) it violates correct specifications or (2) maintains errors behaviors of the original buggy program. In case our approach fails to determine an overfitting patch based on invariants, INVALIDATOR utilizes a trained model from labeled patches to assess patch correctness based on program syntax. The benefit of INVALIDATOR is three-fold. First, INVALIDATOR is able to leverage both semantic and syntactic reasoning to enhance its discriminant capability. Second, INVALIDATOR does not require new test cases to be generated but instead only relies on the current test suite and uses invariant inference to generalize the behaviors of a program. Third, INVALIDATOR is fully automated. We have conducted our experiments on a dataset of 885 patches generated on real-world programs in Defects4J. Experiment results show that INVALIDATOR correctly classified 79% overfitting patches, accounting for 23% more overfitting patches being detected by the best baseline. INVALIDATOR also substantially outperforms the best baselines by 14% and 19% in terms of Accuracy and F-Measure, respectively.
translated by 谷歌翻译